The answer “Raster data is continuous data while vector data is discrete data.” is not complete: a raster of land use type represens a discrete (type) variable, a polygon map with population density represents a continuous variable. The difference lies in spatially continuous variables like elevation or temperature which are more easily represented by raster data, and spatially discrete features, such as houses and roads, which are easier represented by vector data.
The values shown in figure 1.4 are population total associated with their respective counties. Without the county boundaries the meaning disappears: raster pixels do not contain population totals per pixel, population totals over larger regions or populations densities can no longer be derived based on this raster map alone.
(thanks to Jonas Hurst)
Convert the (x, y) point s (10, 2), (-10, -1), (10, -2) and (0, 10) to polar cooridnates
cart2polar = function(x, y){
r = sqrt(x*x + y*y) # compute r (distance from origin)
phi = atan2(y, x) # compute phi (angle between point and positive x axis in rad)
phi_deg = phi * 180 / pi # compute angle in deg
result = c(r, phi_deg)
return(result)
}
cart2polar(10, 2)
## [1] 10.19804 11.30993
cart2polar(-10, -1)
## [1] 10.04988 -174.28941
cart2polar(10, -2)
## [1] 10.19804 -11.30993
cart2polar(0, 10)
## [1] 10 90
Convert the polar (r, phi) points (10, 45°), (0, 100°) and (5, 259°) to Cartesian coordinates
deg2rad = function(angle_degree) {
angle_degree * pi / 180
}
polar2cart = function(r, phi_deg){
# phi must be in degrees
phi_rad = deg2rad(phi_deg) # convert phi in degrees to radians
x = r * cos(phi_rad)
y = r * sin(phi_rad)
c(x, y) # return value
}
polar2cart(10, 45)
## [1] 7.071068 7.071068
polar2cart(0, 100)
## [1] 0 0
polar2cart(5, 259)
## [1] -0.954045 -4.908136
assuming the Earth is a sphere with a radius of 6371 km, compute for (lambda, phi) points the great circle distance between (10, 10) and (11, 10), between (10, 80) >and (11, 80), between (10, 10) and (10, 11) and between (10, 80) and (10, 81).
distOnSphere = function(l1, phi1, l2, phi2, radius) {
l1_rad = deg2rad(l1)
l2_rad = deg2rad(l2)
phi1_rad = deg2rad(phi1)
phi2_rad = deg2rad(phi2)
theta = acos(
sin(phi1_rad) * sin(phi2_rad) +
cos(phi1_rad) * cos(phi2_rad) * cos(abs(l1_rad - l2_rad))
)
radius * theta # return value
}
radius = 3671
distOnSphere(10, 10, 11, 10, radius)
## [1] 63.09763
distOnSphere(10, 80, 11, 80, radius)
## [1] 11.12568
distOnSphere(10, 10, 10, 11, radius)
## [1] 64.07104
distOnSphere(10, 80, 10, 81, radius)
## [1] 64.07104
Unit of all results are kilometers.
(thanks to Jannis Fröhlking)
library(sf)
## Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
x1 <- st_linestring(rbind(c(0,0),c(2,2),c(0,2),c(2,0)))
x2 <- st_polygon(list(rbind(c(3,0),c(5,2),c(3,2),c(5,0),c(3,0))))
plot(c(x1,x2), col = 2:3)
st_is_simple(x1)
## [1] FALSE
st_is_simple(x2)
## [1] FALSE
for(i in c(1,1e3,1e6,1e-2))
print(round(i * c(10.542, 0.01, 45321.789))/i)
## [1] 11 0 45322
## [1] 10.542 0.010 45321.789
## [1] 10.542 0.010 45321.789
## [1] 0 0 45300
Voronoi diagrams have “open polygons”, areas that extend into infinity, for boundary points. These cannot be represented by simple feature geometries. st_voronoi chooses a default (square) polygon to limit the extent, which can be enlarged. Alternatively, the extent can be limited using st_intersection on its result:
library(sf)
par(mfrow = c(2,2))
set.seed(133331)
mp = st_multipoint(matrix(runif(20), 10))
plot(st_voronoi(mp), col = NA, border = 'black')
plot(mp, add = TRUE)
title("default extent")
e2 = st_polygon(list(rbind(c(-5,-5), c(5, -5), c(5,5), c(-5, 5), c(-5,-5))))
plot(st_voronoi(mp, envelope = e2), col = NA, border = 'black')
plot(mp, add = TRUE)
title("enlarged envelope")
e3 = st_polygon(list(rbind(c(0,0), c(1, 0), c(1, 1), c(0, 1), c(0, 0))))
v = st_voronoi(mp) %>% st_collection_extract() # pulls POLYGONs out of GC
plot(st_intersection(v, e3), col = NA, border = 'black', axes=TRUE)
plot(mp, add = TRUE)
title("smaller, intersected envelope")
st_dimension(st_point(c(0,1,1)))
## [1] 0
st_dimension(st_linestring(rbind(c(0,1,1),c(1,1,2))))
## [1] 1
st_dimension(st_polygon(list(rbind(c(0,0,0),c(1,0,0),c(1,1,0),c(0,0,0)))))
## [1] 2
(these are all zero-dimensional geometries because they are points, irrespective the number of dimensions they’re defined in)
line_1 = st_linestring(rbind(c(0,0),c(1,0)))
line_2 = st_linestring(rbind(c(.5,0),c(.5,1)))
plot(line_1,col = "green")
plot(line_2,col = "red", add = TRUE)
st_relate(line_1, line_2)
## [,1]
## [1,] "F01FF0102"
The DE-9IM relation is F01FF0102
(the boundary of a LINESTRING is formed by its two end points)
Yes, but I would say that the set may just contain one polygon, because simple features provide no way of assigning points on the boundary of two adjacent polygons to a single polygon.
# read data
nc <- st_read(system.file("shape/nc.shp", package="sf"))
## Reading layer `nc' from data source
## `/home/edzer/R/x86_64-pc-linux-gnu-library/4.0/sf/shape/nc.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS: NAD27
# get intersections
(nc_geom = st_geometry(nc))
## Geometry set for 100 features
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS: NAD27
## First 5 geometries:
## MULTIPOLYGON (((-81.47276 36.23436, -81.54084 3...
## MULTIPOLYGON (((-81.23989 36.36536, -81.24069 3...
## MULTIPOLYGON (((-80.45634 36.24256, -80.47639 3...
## MULTIPOLYGON (((-76.00897 36.3196, -76.01735 36...
## MULTIPOLYGON (((-77.21767 36.24098, -77.23461 3...
nc_ints = st_intersection(nc_geom)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
plot(nc_ints, main = "All intersections")
# Function to check class of intersection objects
get_points = function(x){
if(class(x)[2]=="POINT") return(x)
}
# get points
points = lapply(nc_ints, get_points)
points[sapply(points,is.null)] <- NULL
sf_points = st_sfc(points)
st_crs(sf_points) = st_crs(nc)
# get points with four neighbouring geometries (=states)
touch = st_touches(sf_points, nc_geom)
four_n = sapply(touch, function(y) which(length(y)==4))
names(four_n) = seq_along(four_n)
point_no = array(as.numeric(names(unlist(four_n))))
result = st_sfc(points[point_no])
plot(nc_geom, main = "Points touched by four counties")
plot(result, add = TRUE, col = "red", pch = 10, cex = 2)
A more compact way might be to search for points where counties touch another county only in a point, which can be found using st_relate using a pattern:
(pts = nc %>% st_relate(pattern = "****0****"))
## although coordinates are longitude/latitude, st_relate_pattern assumes that they are planar
## Sparse geometry binary predicate list of length 100, where the
## predicate was `relate_pattern'
## first 10 elements:
## 1: (empty)
## 2: (empty)
## 3: (empty)
## 4: (empty)
## 5: (empty)
## 6: (empty)
## 7: (empty)
## 8: (empty)
## 9: 31
## 10: 26
nc %>% st_relate(pattern = "****0****") %>% lengths() %>% sum()
## although coordinates are longitude/latitude, st_relate_pattern assumes that they are planar
## [1] 28
which is, as expected, four times the number of points shown in the plot above.
How can we find these points? See here:
nc = st_geometry(nc)
s2 = sf_use_s2(FALSE) # use GEOM geometry
## Spherical geometry (s2) switched off
pts = st_intersection(nc, nc)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
pts = pts[st_dimension(pts) == 0]
plot(st_geometry(nc))
plot(st_geometry(pts), add = TRUE, col = "red", pch = 10, cex = 2)
sf_use_s2(s2) # set back
## Spherical geometry (s2) switched on
Only cells that were fully crossed by the red line would be grey:
library(stars)
## Loading required package: abind
ls = st_sf(a = 2, st_sfc(st_linestring(rbind(c(0.1, 0), c(1, .9)))))
grd = st_as_stars(st_bbox(ls), nx = 10, ny = 10, xlim = c(0, 1.0), ylim = c(0, 1),
values = -1)
attr(grd, "dimensions")$y$delta = .1
attr(grd, "dimensions")$y$offset = 0
r = st_rasterize(ls, grd, options = "ALL_TOUCHED=TRUE")
r[r == -1] = NA
plot(st_geometry(st_as_sf(grd)), border = 'orange', col = NA,
reset = FALSE, key.pos=NULL)
plot(r, axes = TRUE, add = TRUE, breaks = "equal") # ALL_TOUCHED=FALSE;
plot(ls, add = TRUE, col = "red", lwd = 2)
The reason is that in this case, lower left corners of grid cells are part of the cell, rather than upper left corners.
How does the GeoJSON format define “straight” lines between ellipsoidal coordinates (section 3.1.1)? Using this definition of straight, how would LINESTRING(0 85,180 85) look like in a polar projection? How could this geometry be modified to have it cross the North Pole?
GeoJSON defines straight lines between pairs of ellipsoidal coordinates as the straight line in Cartesian space formed by longitude and latitude. This means e.g. that all parallels are straight lines.
Using this definition of straight, how would LINESTRING(0 85,180 85) look like in a polar projection?
Like a half circle:
library(sf)
l <- st_as_sfc("LINESTRING(0 85,180 85)") %>%
st_segmentize(1) %>%
st_set_crs('EPSG:4326')
plot(st_transform(l, 'EPSG:3995'), col = 'red', lwd = 2,
graticule = TRUE, axes = TRUE, reset = FALSE)
How could this geometry be modified to have it cross the North Pole?
One would have to let it pass through (0 90) and (180, 90):
library(sf)
l <- st_as_sfc("LINESTRING(0 85,0 90,180 90,180 85)") %>%
st_segmentize(1) %>%
st_set_crs('EPSG:4326')
plot(st_transform(l, 'EPSG:3995'), col = 'red', lwd = 2,
graticule = TRUE, axes = TRUE, reset = FALSE)
Ring direction (clock-wise CW, counter clock-wise CCW) is unambiguous on \(R^2\) but not on \(S^2\): on \(S^2\) every polygon divides the sphere’s surface in two parts. When the inside of the polygon is taken as the area to the left when traversing the polygons’s points then for a small polygon, then ring direction is CCW if the area of the polygon is smaller than half of the area of the sphere. For polygons dividing the sphere in two equal parts (great circles such as the equator or meridians) ring direction is ambiguous.
Bounding caps may be more compact (have a smaller area compared to the bounding box corresponding to the same geometries), they need fewer parameters, and they are invariant under rotation of (the origins of) longitude and latitude.
For areas covering one of the poles, a bounding box will always need to have a longitude range that spans from -180 to 180, irrespective whether the geometry is centered around the pole.
Because that is the closest approximation of the geometry on \(R^2\).
For rnaturalearth::ne_countries(country = "Fiji", returnclass="sf"), check whether the geometry is valid on \(R^2\), on an orthographic projection centered on the country, and on \(S^2\). How can the geometry be made valid on S^2? Plot the resulting geometry back on \(R^2\). Compare the centroid of the country, as computed on \(R^2\) and on \(S^2\), and the distance between the two.
Valid on \(R^2\):
fi = rnaturalearth::ne_countries(country = "Fiji", returnclass="sf") %>%
st_geometry()
s2 = sf_use_s2(FALSE)
## Spherical geometry (s2) switched off
st_is_valid(fi)
## [1] TRUE
Valid on orthographic projection:
ortho = "+proj=ortho +lon_0=178.6 +lat_0=-17.3"
st_transform(fi, ortho) %>% st_is_valid()
## [1] FALSE
plot(st_transform(fi, ortho), border = 'red')
The red line following the antimeridian makes the geometry invalid in this projection, and also on \(S^2\):
sf_use_s2(TRUE)
## Spherical geometry (s2) switched on
st_is_valid(fi)
## [1] FALSE
Make valid on \(S^2\), and plot:
fi.s2 = st_make_valid(fi)
st_is_valid(fi.s2)
## [1] TRUE
plot(st_transform(fi.s2, ortho), border = 'red')
title("valid")
where we see that the line at the antimeridian has disappeared. This makes plotting in \(R^2\) look terrible, with lines spanning the globe:
plot(fi.s2, axes = TRUE)
Compare the centroid of the country, as computed on \(R^2\) and on \(S^2\), and the distance between the two.
sf_use_s2(FALSE)
## Spherical geometry (s2) switched off
(c1 = st_centroid(fi))
## Warning in st_centroid.sfc(fi): st_centroid does not give correct centroids for
## longitude/latitude data
## Geometry set for 1 feature
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 163.8532 ymin: -17.31631 xmax: 163.8532 ymax: -17.31631
## CRS: +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## POINT (163.8532 -17.31631)
sf_use_s2(TRUE)
## Spherical geometry (s2) switched on
(c2 = st_centroid(fi.s2))
## Geometry set for 1 feature
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 178.5684 ymin: -17.31562 xmax: 178.5684 ymax: -17.31562
## CRS: +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## POINT (178.5684 -17.31562)
st_distance(c1, c2)
## Units: [m]
## [,1]
## [1,] 1561723
sf_use_s2(s2)
StateThe appropriate value would be constant: there is no identity relationship of State to one of the counties in nc, and the value of State is constant through each county in the state (every point in every county in the state has this value for State).
State for the entire stateNow, the unioned geometry is that of the state, and we can assign identity: there is only one state of North Carolina, an this geometry is its geometry.
AREA variableThe nc dataset is rather old, and did not come with an extensive report how, in detail, certain variables such as AREA were derived, so some detective work is needed here. How did people do this, more than three decades ago?
We can now compute area by
library(sf)
nc = read_sf(system.file("gpkg/nc.gpkg", package="sf"))
nc$AREA[1:10]
## [1] 0.114 0.061 0.143 0.070 0.153 0.097 0.062 0.091 0.118 0.124
s2 = sf_use_s2(FALSE) # use spherical geometry:
## Spherical geometry (s2) switched off
nc$area = a_sph = st_area(nc)
nc$area[1:10]
## Units: [m^2]
## [1] 1137388604 611077263 1423489919 694546292 1520740530 967727952
## [7] 615942210 903650119 1179347051 1232769242
sf_use_s2(TRUE) # use ellipsoidal geometry:
## Spherical geometry (s2) switched on
nc$area = a_ell = st_area(nc)
nc$area[1:10]
## Units: [m^2]
## [1] 1137107793 610916077 1423145355 694378925 1520366979 967504822
## [7] 615794941 903423919 1179065710 1232475139
sf_use_s2(s2) # set back to original
cor(a_ell, a_sph)
## [1] 0.9999999
and this gives the area, in square metres, computed using either ellipsoidal or spherical geometry. We see that these are not identical, but nearly perfectly linearly correlated.
A first hypothesis might be a constant factor between the area and AREA variables. For this, we could try a power of 10:
nc$area2 = units::drop_units(nc$area / 1e10)
cor(nc$AREA, nc$area2)
## [1] 0.9998116
summary(lm(area2 ~ AREA, nc))
##
## Call:
## lm(formula = area2 ~ AREA, data = nc)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.281e-03 -6.279e-04 6.328e-05 5.495e-04 2.746e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0009781 0.0002692 -3.633 0.000448 ***
## AREA 1.0138124 0.0019882 509.920 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0009733 on 98 degrees of freedom
## Multiple R-squared: 0.9996, Adjusted R-squared: 0.9996
## F-statistic: 2.6e+05 on 1 and 98 DF, p-value: < 2.2e-16
plot(area2 ~ AREA, nc)
abline(0, 1)
and we see a pretty good, close to 1:1 correspondence! But the factor 1e10 is strange: it does not convert square metres into a usual unit for area, neither for metric nor for imperial units.
Also, there are deviations from the 1:1 regression line. Could these be explained by the rounding of AREA to three digits? If rounding to three digits was the only cause of spread around the regression line, we would expect a residual standard error similar to the standard deviation of a uniform distribution with width .001, which is
sqrt(0.001^2/12)
## [1] 0.0002886751
but the one obtained int he regression is three times larger. Also, the units of AREA would be 1e10 \(m^2\), or 1e4 \(km^2\), which is odd and could ring some bells: one degree latitude corresponds roughly to 111 km, so one “square degree” at the equator corresponds roughly to \(1.11^2 \times 10^4\), and at 35 degrees North roughly to
111 ^ 2 * cos(35 / 180 * pi)
## [1] 10092.77
which closely corresponds to the regression slope found above.
We can compute “square degree” area by using the \(R^2\) area routines, e.g. obtained when we set the CRS to NA:
nc2 = nc
st_crs(nc2) = NA
nc2$area = st_area(nc2) # "square degrees"
plot(area ~ AREA, nc2)
abline(0,1)
cor(nc2$area, nc2$AREA)
## [1] 0.999983
summary(lm(area ~ AREA, nc2))
##
## Call:
## lm(formula = area ~ AREA, data = nc2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.471e-04 -2.265e-04 -9.880e-06 2.714e-04 4.594e-04
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.436e-05 7.965e-05 0.934 0.353
## AREA 9.996e-01 5.882e-04 1699.395 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0002879 on 98 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 2.888e+06 on 1 and 98 DF, p-value: < 2.2e-16
We now get a much better fit, a near perfect correlation, and a regression standard error that corresponds exactly to what one would expect after rounding AREA to three digits.
A further “red flag” against the constant (1e10) conversion hypothesis is the spatial pattern of the regression residuals obtained by the first approach:
nc$resid = residuals(lm(area2 ~ AREA, nc))
plot(nc["resid"])
these residuals clearly show a North-South trend, corresponding to the effect that the Earth’s curvature has been ignored during the computation of AREA (ellipsoidal coordinates were treated as if they were Cartesian). “Square degrees” become smaller when going north.
The “unit” of the AREA variable is hence “square degree”. This is a meaningless unit for area on the sphere, because a unit square degree does not have a constant area.
area“area” is of type aggregate: it is a property of a polygon as a whole, not of each individual point in the polygon. It is extensive: if we cut a polygon in two parts, the total area is distributed over the parts.
From the on-line version of the book we get the code that created the plot:
g = st_make_grid(st_bbox(st_as_sfc("LINESTRING(0 0,1 1)")), n = c(2,2))
par(mar = rep(0,4))
plot(g)
plot(g[1] * diag(c(3/4, 1)) + c(0.25, 0.125), add = TRUE, lty = 2)
text(c(.2, .8, .2, .8), c(.2, .2, .8, .8), c(1,2,4,8), col = 'red')
A question is how we can make g into an sf object with the right attribute values associated with the right geometries. We try values 1:4:
sf = st_sf(x = 1:4, geom = g)
plot(sf)
and see the order of the geometries: row-wise, bottom row first, so
sf = st_sf(x = c(1,2,4,8), geom = g)
plot(sf)
gives us the source object. We create target geometries by
dashed = g[1] * diag(c(3/4, 1)) + c(0.25, 0.125)
box = st_union(g)
c(dashed, box)
## Geometry set for 2 features
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 0 ymin: 0 xmax: 1 ymax: 1
## CRS: NA
## POLYGON ((0.25 0.125, 0.625 0.125, 0.625 0.625,...
## POLYGON ((1 0, 0.5 0, 0 0, 0 0.5, 0 1, 0.5 1, 1...
and can call st_interpolate_aw to compute the area-weighted interpolations:
st_interpolate_aw(sf, c(dashed, box), extensive = TRUE)
## Warning in st_interpolate_aw.sf(sf, c(dashed, box), extensive = TRUE):
## st_interpolate_aw assumes attributes are constant or uniform over areas of x
## Simple feature collection with 2 features and 1 field
## Attribute-geometry relationship: 0 constant, 1 aggregate, 0 identity
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 0 ymin: 0 xmax: 1 ymax: 1
## CRS: NA
## x geometry
## 1 1.75 POLYGON ((0.25 0.125, 0.625...
## 2 15.00 POLYGON ((1 0, 0.5 0, 0 0, ...
st_interpolate_aw(sf, c(dashed, box), extensive = FALSE)
## Warning in st_interpolate_aw.sf(sf, c(dashed, box), extensive = FALSE):
## st_interpolate_aw assumes attributes are constant or uniform over areas of x
## Simple feature collection with 2 features and 1 field
## Attribute-geometry relationship: 0 constant, 1 aggregate, 0 identity
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 0 ymin: 0 xmax: 1 ymax: 1
## CRS: NA
## x geometry
## 1 2.333333 POLYGON ((0.25 0.125, 0.625...
## 2 3.750000 POLYGON ((1 0, 0.5 0, 0 0, ...
This generates a warning, which we can get rid of by setting the agr to constant:
st_agr(sf) = "constant"
st_interpolate_aw(sf, c(dashed, box), FALSE)
## Simple feature collection with 2 features and 1 field
## Attribute-geometry relationship: 0 constant, 1 aggregate, 0 identity
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 0 ymin: 0 xmax: 1 ymax: 1
## CRS: NA
## x geometry
## 1 2.333333 POLYGON ((0.25 0.125, 0.625...
## 2 3.750000 POLYGON ((1 0, 0.5 0, 0 0, ...
Why is it difficult to represent trajectories, sequences of \((x,y,t)\) obtained by tracking moving objects, by data cubes as described in this chapter?
In a socio-economic vector data cube with variables population, life expectancy, and gross domestic product ordered by dimensions country and year, which variables have block support for the spatial dimension, and which have block support for the temporal dimension?
The Sentinel-2 satellites collect images in 12 spectral bands; list advantages and disadvantages to represent them as (i) different data cubes, (ii) a data cube with 12 attributes, one for each band, and (iii) a single attribute data cube with a spectral dimension.
Explain why a curvilinear raster as shown in figure 1.5 can be considered a special case of a data cube.
Explain how the following problems can be solved with data cube operations filter, apply, reduce and/or aggregate, and in which order. Also mention for each which function is applied, and what the dimensionality of the resulting data cube is (if any):
from hourly \(PM_{10}\) measurements for a set of air quality monitoring stations, compute per station the amount of days per year that the average daily \(PM_{10}\) value exceeds 50 \(\mu g/m^3\)
This gives a one-dimensional data cube, with dimension “station”
for a sequence of aerial images of an oil spill, find the time at which the oil spill had its largest extent, and the corresponding extent
This gives a zero-dimensional data cube (a scalar).
from a 10-year period with global daily sea surface temperature (SST) raster maps, find the area with the 10% largest and 10% smallest temporal trends in SST values.
lm)This gives a two-dimensional data cube (or raster layer: the reclassified trend raster).
Find the names of the nc counties that intersect LINESTRING(-84 35,-78 35); use [ for this, and use st_join() for this.
(file = system.file("gpkg/nc.gpkg", package="sf"))
## [1] "/home/edzer/R/x86_64-pc-linux-gnu-library/4.0/sf/gpkg/nc.gpkg"
nc = st_read(file)
## Reading layer `nc.gpkg' from data source
## `/home/edzer/R/x86_64-pc-linux-gnu-library/4.0/sf/gpkg/nc.gpkg'
## using driver `GPKG'
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## Geodetic CRS: NAD27
line = st_as_sfc("LINESTRING(-84 35,-78 35)", crs = st_crs(nc))
nc[line,]$NAME
## [1] "Jackson" "Mecklenburg" "Macon" "Sampson" "Cherokee"
## [6] "Cumberland" "Union" "Anson" "Hoke" "Duplin"
## [11] "Richmond" "Clay" "Scotland"
st_join(st_sf(line), nc)$NAME # left join: `line` should be first argument
## [1] "Jackson" "Mecklenburg" "Macon" "Sampson" "Cherokee"
## [6] "Cumberland" "Union" "Anson" "Hoke" "Duplin"
## [11] "Richmond" "Clay" "Scotland"
Repeat this after setting sf_use_s2(FALSE), and compute the difference (hint: use setdiff()), and color the counties of the difference using color ‘#00880088’.
# save names first:
sf_use_s2(TRUE)
names_with_s2 = nc[line,]$NAME
sf_use_s2(FALSE)
## Spherical geometry (s2) switched off
nc[line,]$NAME
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
## [1] "Macon" "Sampson" "Cherokee" "Cumberland" "Union"
## [6] "Anson" "Hoke" "Duplin" "Richmond" "Clay"
## [11] "Scotland"
(diff = setdiff(names_with_s2, nc[line,]$NAME))
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
## [1] "Jackson" "Mecklenburg"
plot(st_geometry(nc))
plot(st_geometry(nc)[nc$NAME %in% diff], col = "#00880088", add = TRUE)
Plot the two different lines in a single plot; note that R will plot a straight line always straight in the projection currently used; st_segmentize can be used to add points on straight line, or on a great circle for ellipsoidal coordinates.
plot(st_geometry(nc))
plot(st_geometry(nc)[nc$NAME %in% diff], col = "#00880088", add = TRUE)
plot(line, add = TRUE)
plot(st_segmentize(line, units::set_units(10, km)), add = TRUE, col = 'red')
NDVI, normalized differenced vegetation index, is computed as (NIR-R)/(NIR+R), with NIR the near infrared and R the red band. Read the L7_ETMs.tif file into object x, and distribute the band dimensions over attributes by split(x, "band"). Then, add attribute NDVI to this object by using an expression that uses the NIR (band 4) and R (band 3) attributes directly.
library(stars)
(x = read_stars(system.file("tif/L7_ETMs.tif", package = "stars")))
## stars object with 3 dimensions and 1 attribute
## attribute(s):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## L7_ETMs.tif 1 54 69 68.91242 86 255
## dimension(s):
## from to offset delta refsys point values x/y
## x 1 349 288776 28.5 UTM Zone 25, Southern Hem... FALSE NULL [x]
## y 1 352 9120761 -28.5 UTM Zone 25, Southern Hem... FALSE NULL [y]
## band 1 6 NA NA NA NA NULL
(x.spl = split(x))
## stars object with 2 dimensions and 6 attributes
## attribute(s):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## X1 47 67 78 79.14772 89 255
## X2 32 55 66 67.57465 79 255
## X3 21 49 63 64.35886 77 255
## X4 9 52 63 59.23541 75 255
## X5 1 63 89 83.18266 112 255
## X6 1 32 60 59.97521 88 255
## dimension(s):
## from to offset delta refsys point values x/y
## x 1 349 288776 28.5 UTM Zone 25, Southern Hem... FALSE NULL [x]
## y 1 352 9120761 -28.5 UTM Zone 25, Southern Hem... FALSE NULL [y]
x.spl$NDVI = (x.spl$X4 - x.spl$X3)/(x.spl$X4 + x.spl$X3)
plot(x.spl["NDVI"])
Compute NDVI for the L7_ETMs.tif image by reducing the band dimension, using st_apply and an a function ndvi = function(x) { (x[4]-x[3])/(x[4]+x[3]) }. Plot the result, and write the result to a GeoTIFF.
ndvi_fn = function(x) { (x[4]-x[3])/(x[4]+x[3]) }
ndvi = st_apply(x, 1:2, ndvi_fn)
plot(ndvi)
write_stars(ndvi, "ndvi.tif")
an alternative function is
ndvi_fn = function(x1,x2,x3,x4,x5,x6) { (x4-x3)/(x4+x3) }
ndvi2 = st_apply(x, 1:2, ndvi_fn)
all.equal(ndvi, ndvi2)
## [1] TRUE
This latter function can be much faster, as it is called for chunks of data rather than for individual pixels.
Use st_transform to transform the stars object read from L7_ETMs.tif to EPSG:4326. Print the object. Is this a regular grid? Plot the first band using arguments axes=TRUE and explain why this takes such a long time.
(x_t = st_transform(x, 'EPSG:4326'))
plot(x_t[,,,1], axes = TRUE)
the report says it is a curvilinear grid. Plotting takes so long because for curvilinear grids, each cell is converted to a small polygon and then plotted.
Use st_warp to warp the L7_ETMs.tif object to EPSG:4326, and plot the resulting object with axes=TRUE. Why is the plot created much faster than after st_transform?
x_w = st_warp(x, crs = 'EPSG:4326')
plot(x_w[,,,1], reset = FALSE)
## downsample set to c(0,0,1)
plot(st_as_sfc(st_bbox(x_w)), col = NA, border = 'red', add = TRUE)
Plotting is faster now because we created a new regular grid. Note that the grid border does not align perfectly with the square formed by the bounding box (using straight lines in an equidistant rectangular projection): white grid cells indicate the misalignment due to warping/transforming.
For the S2 image (above), find out in which order the bands are using st_get_dimension_values(), and try to find out (e.g. by internet search) which spectral bands / colors they correspond to.
f = "sentinel/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.zip"
granule = system.file(file = f, package = "starsdata")
file.size(granule)
## [1] 769461997
base_name = strsplit(basename(granule), ".zip")[[1]]
s2 = paste0("SENTINEL2_L1C:/vsizip/", granule, "/", base_name,
".SAFE/MTD_MSIL1C.xml:10m:EPSG_32632")
(p = read_stars(s2, proxy = TRUE))
## stars_proxy object with 1 attribute in 1 file(s):
## $`MTD_MSIL1C.xml:10m:EPSG_32632`
## [1] "[...]/MTD_MSIL1C.xml:10m:EPSG_32632"
##
## dimension(s):
## from to offset delta refsys point values x/y
## x 1 10980 3e+05 10 WGS 84 / UTM zone 32N NA NULL [x]
## y 1 10980 6e+06 -10 WGS 84 / UTM zone 32N NA NULL [y]
## band 1 4 NA NA NA NA B4,...,B8
st_get_dimension_values(p, "band")
## [1] "B4" "B3" "B2" "B8"
Compute NDVI for the S2 image, using st_apply and an an appropriate ndvi function. Plot the result to screen, and then write the result to a GeoTIFF. Explain the difference in runtime between plotting and writing.
ndvi_fn = function(r, g, b, nir) (nir-r)/(nir+r)
ndvi = st_apply(p, 1:2, ndvi_fn)
plot(ndvi)
## downsample set to c(19)
Alternatively, one could use
ndvi_fn = function(r, g, b, nir) (nir-r)/(nir+r)
but that is much less efficient. Write to a tiff:
system.time(write_stars(ndvi, "ndvi.tif"))
## ================================================================================
## user system elapsed
## 221.005 6.905 55.413
The runtime difference is caused by the fact that plot downsamples, so computes a very small fraction of the available pixels, where write_stars computes all pixels, and then writes them.
Plot an RGB composite of the S2 image, using the rgb argument to plot(), and then by using st_rgb() first.
plot(p, rgb = 1:3)
## downsample set to c(19)
plot(st_rgb(p[,,,1:3], maxColorValue=13600))
## downsample set to c(19)
select five random points from the bounding box of S2, and extract the band values at these points. What is the class of the object returned? Convert the object returned to an sf object.
pts = p %>% st_bbox() %>% st_as_sfc() %>% st_sample(5)
(p5 = st_extract(p, pts))
## stars object with 2 dimensions and 1 attribute
## attribute(s):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## MTD_MSIL1C.xml:10m:EPSG_32632 0 1699.25 2209.5 2021.95 2837.5 3648
## dimension(s):
## from to offset delta refsys point
## geometry 1 5 NA NA WGS 84 / UTM zone 32N TRUE
## band 1 4 NA NA NA NA
## values
## geometry POINT (344398.1 5986860),...,POINT (318560.4 5980671)
## band B4,...,B8
class(p5)
## [1] "stars"
st_as_sf(p5)
## Simple feature collection with 5 features and 4 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 318560.4 ymin: 5911044 xmax: 407394.7 ymax: 5996965
## Projected CRS: WGS 84 / UTM zone 32N
## B4 B3 B2 B8 geometry
## 1 3099 3165 3648 3088 POINT (344398.1 5986860)
## 2 0 0 0 0 POINT (407394.7 5911044)
## 3 2220 2199 2526 1973 POINT (370068.3 5947493)
## 4 1947 1806 2030 1379 POINT (404787.1 5996965)
## 5 2477 2731 3397 2754 POINT (318560.4 5980671)
For the 10 km radius circle around POINT(390000 5940000), compute the mean pixel values of the S2 image when downsampling the images with factor 30, and on the original resolution. Compute the relative difference between the results.
b = st_buffer(st_sfc(st_point(c(390000, 5940000)), crs = st_crs(p)),
units::set_units(10, km))
plot(p[,,,1], reset = FALSE, axes = TRUE)
## downsample set to c(19)
plot(b, col = NA, border = 'green', add = TRUE)
p1 = st_as_stars(p, downsample = 30)
a1 = aggregate(p1, b, mean)
For the full resolution; this takes a while:
system.time(a2 <- aggregate(p, b, mean))
## Warning in c.stars(structure(list(MTD_MSIL1C.xml.10m.EPSG_32632 =
## structure(c(1032.86211707414, : along argument ignored; maybe you wanted to use
## st_redimension?
## user system elapsed
## 77.600 1.037 70.074
Relative differences: we will work on the array of the stars objects:
(a1[[1]] - a2[[1]])/((a1[[1]]+a2[[1]])/2)
## [,1] [,2] [,3] [,4]
## [1,] 0.001055567 0.0009431265 0.001103254 7.781836e-05
Alternatively one could convert a1 and a2 to a data.frame, using as.data.frame, and work on the third column of the data frames.
Use hist to compute the histogram on the downsampled S2 image. Also do this for each of the bands. Use ggplot2 to compute a single plot with all four histograms in facets.
hist(p1)
hist(p1[,,,1])
hist(p1[,,,2])
hist(p1[,,,3])
hist(p1[,,,4])
library(ggplot2)
ggplot(as.data.frame(p1), aes(x = MTD_MSIL1C.xml.10m.EPSG_32632)) +
geom_histogram() + facet_wrap(~band)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Use st_crop to crop the S2 image to the area covered by the 10 km circle. Plot the results. Explore the effect of setting argument crop = FALSE
plot(st_crop(p, b))
## downsample set to c(3)
plot(st_crop(p, b, crop = FALSE))
## downsample set to c(19)
With the downsampled image, compute the logical layer where all four bands have pixel values higher than 1000. Use a raster algebra expression on the four bands (use split first), or use st_apply for this.
p_spl = split(p1)
p_spl$high = p_spl$B4 > 1000 & p_spl$B3 > 1000 & p_spl$B2 > 1000 & p_spl$B8 > 1000
plot(p_spl["high"])
alternative, using st_apply on the band dimension
p2 = st_apply(p1, 1:2, function(x) all(x > 1000))
plot(p2)